Multiword expressions in spontaneous speech: do we really speak like that?
نویسندگان
چکیده
In this study, we examined the pronunciation characteristics of multiword expressions (MWEs). We first drew up an inventory of frequently occurring N-grams extracted from orthographic transcriptions of spontaneous speech contained in a large corpus of spoken Dutch. For about 10% of these Ngrams phonetic transcriptions were available, which were examined. Our results show that the pronunciation of these Ngrams differed to a large extent from the canonical form. In order to determine whether this is a general characteristic of spontaneous speech or rather the effect of the specific status of these N-grams, we analyzed the pronunciations of the individual words composing the N-grams in two context conditions: 1) in the N-gram context and 2) in any other context. We found that words in N-grams do indeed have peculiar pronunciation patterns. This seems to suggest that these N-grams may be considered as MWEs that should therefore be treated as lexical entries with their own specific pronunciation variants in the pronunciation lexicons used for e.g. automatic speech recognition (ASR) and automatic phonetic transcription (APT).
منابع مشابه
Multiword expressions in spoken language: An exploratory study on pronunciation variation
The study presented in this paper was aimed at exploring the possibilities of modelling specific pronunciation characteristics of multiword expressions (MWEs) for both automatic speech recognition (ASR) and automatic phonetic transcription (APT). For this purpose, we first drew up an inventory of frequently found N-grams extracted from orthographic transcriptions of spontaneous speech contained...
متن کاملAutomatic Extraction of Fixed Multiword Expressions
Fixed multiword expressions are strings of words which together behave like a single word. This research establishes a method for the automatic extraction of such expressions. Our method involves three stages. In the first, a statistical measure is used to extract candidate bigrams. In the second, we use this list to select occurrences of candidate expressions in a corpus, together with their s...
متن کاملIntroduction to the special issue on multiword expressions: Having a crack at a hard nut
Multiword expressions are an integral part of language. Their heterogeneous characteristics have proved a challenge to both linguistic and computational analysis. Their importance to language technology has long been recognised. In this special issue we include ten papers which propose a variety of approaches for finding and handling these expressions, both for building general purpose lexical ...
متن کاملMicrosoft Word - Finding More Non-supersingular Elliptic Curves for Pairing..
Machine Translation, (hereafter in this document referred to as the "MT") faces a lot of complex problems from its origination. Extracting multiword expressions is also one of the complex problems in MT. Finding multiword expressions during translating a sentence from English into Urdu, through existing solutions, takes a lot of time and occupies system resources. We have designed a simple rela...
متن کاملMULTILINGUAL MULTIWORD EXPRESSIONS Literature Survey
Multiword Expressions are idiosyncratic word usages of a language which often have noncompositional meaning. The knowledge of multiword expressions is necessary for many NLP tasks like, machine translation, natural language generation, named entity recognition, sentiment analysis etc. In order for other NLP applications to benefit from the knowledge of multiword expressions, they need to be ide...
متن کامل